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IN THE CLAIMS 

A listing of the claims of the present application is as follows: 

1. (Currently Amended) A multi-modal conversational computing system, the system 
k comprising: 

a user interface subsystem, the user interface subsystem being configured to input multi- 
modal data from an environment in which the user interface subsystem is deployed, the multi-modal 
data including data associated with a first modality input sensor and data associated with at least a 
second modality input sensor, and the environment including one or more users and one or more 
devices which are controllable by the multi-modal system; 

at least one processor, the at least one processor being operatively coupled to the user 
interface subsystem and being configured to: (i) receive at least a portion of the multi-modal input 
data from the user interface subsystem; (ii) be capable of making a determination of an intent, a 
focus and a mood of at least one of the one or more users based on at least a portion of the received 
multi-modal input data; and (iii) cause execution of one or more actions to occur in the environment 
based on at least one of the determined intent, the determined focus and the determined mood; and 

memory, operatively coupled to the at least one processor, which stores at least a portion of 
results associated with the intent, focus and mood determinations made by the processor for possible 
use in a subsequent determination; 

wherein the intent determination comprises resolving referential ambiguity associated with 
the one or more users and the one or more devices in the environment based on at least a portion of 
the received multi-modal data . 

2. (Canceled). 

3. (Canceled). 

4. (Currently Amended) A multi-modal conversational computing system, the system 
comprising: 
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a user interface subsystem, the user interface subsystem being configured to input multi- 
modal data from an environment in which the user interface subsystem is deployed, the multi-modal 
data including data associated with a first modality input sensor and data associated with at least a 
second modality input sensor, and the environment including one or more users and one or more 
devices which are controllable by the multi-modal system; 

at least one processor, the at least one processor being operatively coupled to the user 
interface subsystem and being configured to: (i) receive at least a portion of the multi-modal input 
data from the user interface subsystem; (ii) make a determination of at least one of an intent, a focus 
and a mood of at least one of the one or more users based on at least a portion of the received multi- 
modal input data; and (iii) cause execution of one or more actions to occur in the environment based 
on at least one of the determined intent, the determined focus and the determined mood; and 

memory, operatively coupled to the at least one processor, which stores at least a portion of 
results associated with the intent, focus and mood determinations made by the processor for possible 
use in a subsequent determination; 

wherein the intent determination comprises resolving referential ambiguity associated with 
the one or more users and the one or more devices in the environment based on at least a portion of 
the received multi-modal data; 

wherein the execution of one or more actions in the environment comprises controlling at 
least one of the one or more devices in the environment to at least one of effectuate the determined 
intent, effect the determined focus, and effect the determined mood of the one or more users. 

5. (Currently Amended) A multi-modal conversational computing system, the system 
comprising: 

a user interface subsystem, the user interface subsystem being configured to input multi- 
modal data from an environment in which the user interface subsystem is deployed, the multi-modal 
data including data associated with a first modality input sensor and data associated with at least a 
second modality input sensor, and the environment including one or more users and one or more 
devices which are controllable by the multi-modal system; 
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at least one processor, the at least one processor being operatively coupled to the user 
interface subsystem and being configured to: (i) receive at least a portion of the multi-modal input 
data from the user interface subsystem; (ii) make a determination of at least one of an intent, a focus 
and a mood of at least one of the one or more users based on at least a portion of the received multi- 
modal input data; and (iii) cause execution of one or more actions to occur in the environment based 
on at least one of the determined intent, the determined focus and the determined mood; and 

memory, operatively coupled to the at least one processor, which stores at least a portion of 
results associated with the intent, focus and mood determinations made by the processor for possible 
use in a subsequent determination; 

wherein the intent determination comprises resolving referential ambiguity associated with 
the one or more users and the one or more devices in the environment based on at least a portion of 
the received multi-modal data; 

wherein the execution of one or more actions in the environment comprises controlling at 
least one of the one or more devices in the environment to request further user input to assist in 
making at least one of the determinations. 

6. (Currently Amended) A multi-modal conversational computing system, the system 
comprising: 

a user interface subsystem, the user interface subsystem being configured to input multi- 
modal data from an environment in which the user interface subsystem is deployed, the multi-modal 
data including data associated with a first modality input sensor and data associated with at least a 
second modality input sensor, and the environment including one or more users and one or more 
devices which are controllable by the multi-modal system; 

at least one processor, the at least one processor being operatively coupled to the user 
interface subsystem and being configured to: (i) receive at least a portion of the multi-modal input 
data from the user interface subsystem; (ii) make a determination of at least one of an intent, a focus 
and a mood of at least one of the one or more users based on at least a portion of the received multi- 
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modal input data; and (iii) cause execution of one or more actions to occur in the environment based 
on at least one of the determined intent, the determined focus and the determined mood; and 

memory, operatively coupled to the at least one processor, which stores at least a portion of 
results associated with the intent, focus and mood determinations made by the processor for possible 
use in a subsequent determination; 

wherein the intent determination comprises resolving referential ambiguity associated with 
the one or more users and the one or more devices in the environment based on at least a portion of 
the received multi-modal data; 

wherein the execution of the one or more actions comprises initiating a process to at least one 
of further complete, correct, and disambiguate what the system understands from previous input. 

7. (Currently Amended) A multi-modal conversational computing system, the system 
comprising: 

a user interface subsystem, the user interface subsystem being configured to input multi- 
modal data from an environment in which the user interface subsystem is deployed, the multi-modal 
data including data associated with a first modality input sensor and data associated with at least a 
second modality input sensor, and the environment including one or more users and one or more 
devices which are controllable by the multi-modal system; 

at least one processor, the at least one processor being operatively coupled to the user 
interface subsystem and being configured to: (i) receive at least a portion of the multi-modal input 
data from the user interface subsystem; (ii) make a determination of at least one of an intent, a focus 
and a mood of at least one of the one or more users based on at least a portion of the received multi- 
modal input data; and (iii) cause execution of one or more actions to occur in the environment based 
on at least one of the determined intent, the determined focus and the determined mood; and 

memory, operatively coupled to the at least one processor, which stores at least a portion of 
results associated with the intent, focus and mood determinations made by the processor for possible 
use in a subsequent determination; 
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wherein the intent determination comprises resolving referential ambiguity associated with 
the one or more users and the one or more devices in the environment based on at least a portion of 
the received multi-modal data; 

wherein the at least one processor is further configured to abstract the received multi-modal 
input data into one or more events prior to making the one or more determinations. 

8. (Currently Amended) A multi-modal conversational computing system, the system 
comprising: 

a user interface subsystem, the user interface subsystem being configured to input multi- 
modal data from an environment in which the user interface subsystem is deployed, the multi-modal 
data including data associated with a first modality input sensor and data associated with at least a 
second modality input sensor, and the environment including one or more users and one or more 
devices which are controllable by the multi-modal system; 

at least one processor, the at least one processor being operatively coupled to the user 
interface subsystem and being configured to: (i) receive at least a portion of the multi-modal input 
data from the user interface subsystem; (ii) make a determination of at least one of an intent, a focus 
and a mood of at least one of the one or more users based on at least a portion of the received multi- 
modal input data; and (iii) cause execution of one or more actions to occur in the environment based 
on at least one of the determined intent, the determined focus and the determined mood; and 

memory, operatively coupled to the at least one processor, which stores at least a portion of 
results associated with the intent, focus and mood determinations made by the processor for possible 
use in a subsequent determination; 

wherein the intent determination comprises resolving referential ambiguity associated with 
the one or more users and the one or more devices in the environment based on at least a portion of 
the received multi-modal data: 

wherein the at least one processor is further configured to perform one or more recognition 
operations on the received multi-modal input data prior to making the one or more determinations. 
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9. (Currently Amended) A multi-modal conversational computing system, the system 
comprising: 

a user interface subsystem, the user interface subsystem being configured to input multi- 
modal data from an environment in which the user interface subsystem is deployed, the multi-modal 
data including data associated with a first modality input sensor and data associated with at least a 
second modality input sensor, and the environment including one or more users and one or more 
devices which are controllable by the multi-modal system; 

an input/output manager module operatively coupled to the user interface subsystem and 
configured to abstract the multi-modal input data into one or more events; 

one or more recognition engines operatively coupled to the input/output manager module and 
configured to perform, when necessary, one or more recognition operations on the abstracted multi- 
modal input data; 

a dialog manager module operatively coupled to the one or more recognition engines and the 
input/output manager module and configured to: (i) receive at least a portion of the abstracted multi- 
modal input data and, when necessary, the recognized multi-modal input data; (ii) make a 
determination of an intent of at least one of the one or more users based on at least a portion of the 
received multi-modal input data; and (iii) cause execution of one or more actions to occur in the 
environment based on the determined intent; 

a focus and mood classification module operatively coupled to the one or more recognition 
engines and the input/output manager module and configured to: (i) receive at least a portion of the 
abstracted multi-modal input data and, when necessary, the recognized multi-modal input data; (ii) 
make a determination of at least one of a focus and a mood of at least one of the one or more users 
based on at least a portion of the received multi-modal input data; and (iii) cause execution of one 
or more actions to occur in the environment based on at least one of the determined focus and mood; 
and 

a context stack memory operatively coupled to the dialog manager module, the one or more 
recognition engines and the focus and mood classification module, which stores at least a portion 
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of results associated with the intent, focus and mood determinations made by the dialog manager and 
the classification module for possible use in a subsequent determination; 

wherein the intent determination comprises resolving referential ambiguity associated with 
the one or more users and the one or more devices in the environment based on at least a portion of 
the received multi-modal data . 

10. (Currently Amended) A computer-based conversational computing method, the method 
comprising the steps of: 

obtaining multi-modal data from an environment including one or more users and one or 
more controllable devices, the multi-modal data including data associated with a first modality input 
sensor and data associated with at least a second modality input sensor; 

providing for a capability to make a determination of an intent, a focus and a mood of at least 
one of the one or more users based on at least a portion of the obtained multi-modal input data; 

causing execution of one or more actions to occur in the environment based on at least one 
of the determined intent, the determined focus and the determined mood; and 

storing at least a portion of results associated with the intent, focus and mood determinations 
for possible use in a subsequent determination; 

wherein the intent determination comprises resolving referential ambiguity associated with 
the one or more users and the one or more devices in the environment based on at least a portion of 
the received multi-modal data . 

11. (Canceled). 

12. (Canceled). 

1 3 . (Currently Amended) A computer-based conversational computing method, the method 
comprising the steps of: 
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obtaining multi-modal data from an environment including one or more users and one or 
more controllable devices, the multi-modal data including data associated with a first modality input 
sensor and data associated with at least a second modality input sensor; 

making a determination of at least one of an intent, a focus and a mood of at least one of the 
one or more users based on at least a portion of the obtained multi-modal input data; 

causing execution of one or more actions to occur in the environment based on at least one 
of the determined intent, the determined focus and the determined mood; and 

storing at least a portion of results associated with the intent, focus and mood determinations 
for possible use in a subsequent determination; 

wherein the intent determination comprises resolving referential ambiguity associated with 
the one or more users and the one or more devices in the environment based on at least a portion of 
the received multi-modal data; 

wherein the step of causing the execution of one or more actions in the environment 
comprises controlling at least one of the one or more devices in the environment to at least one of 
effectuate the determined intent, effect the determined focus, and effect the determined mood of the 
one or more users. 

14. (Currently Amended) A computer-based conversational computing method, the method 
comprising the steps of: 

obtaining multi-modal data from an environment including one or more users and one or 
more controllable devices, the multi-modal data including data associated with a first modality input 
sensor and data associated with at least a second modality input sensor; 

making a determination of at least one of an intent, a focus and a mood of at least one of the 
one or more users based on at least a portion of the obtained multi-modal input data; 

causing execution of one or more actions to occur in the environment based on at least one 
of the determined intent, the determined focus and the determined mood; and 

storing at least a portion of results associated with the intent, focus and mood determinations 
for possible use in a subsequent determination; 
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wherein the intent determination comprises resolving referential ambiguity associated with 
the one or more users and the one or more devices in the environment based on at least a portion of 
the received multi-modal data; 

wherein the step of causing the execution of one or more actions in the environment 
comprises controlling at least one of the one or more devices in the environment to request further 
user input to assist in making at least one of the determinations. 

1 5 . (Currently Amended) A computer-based conversational computing method, the method 
comprising the steps of: 

obtaining multi-modal data from an environment including one or more users and one or 
more controllable devices, the multi-modal data including data associated with a first modality input 
sensor and data associated with at least a second modality input sensor; 

making a determination of at least one of an intent, a focus and a mood of at least one of the 
one or more users based on at least a portion of the obtained multi-modal input data; 

causing execution of one or more actions to occur in the environment based on at least one 
of the determined intent, the determined focus and the determined mood; and 

storing at least a portion of results associated with the intent, focus and mood determinations 
for possible use in a subsequent determination; 

wherein the intent determination comprises resolving referential ambiguity associated with 
the one or more users and the one or more devices in the environment based on at least a portion of 
the received multi-modal data; 

wherein the step of causing the execution of the one or more actions comprises initiating a 
process to at least one of further complete, correct, and disambiguate what the system understands 
from previous input. 

1 6. (Currently Amended) A computer-based conversational computing method, the method 
comprising the steps of: 
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obtaining multi-modal data from an environment including one or more users and one or 
more controllable devices, the multi-modal data including data associated with a first modality input 
sensor and data associated with at least a second modality input sensor; 

making a determination of at least one of an intent, a focus and a mood of at least one of the 
one or more users based on at least a portion of the obtained multi-modal input data; 

causing execution of one or more actions to occur in the environment based on at least one 
of the determined intent, the determined focus and the determined mood; and 

storing at least a portion of results associated with the intent, focus and mood determinations 
for possible use in a subsequent determination; 

wherein the intent determination comprises resolving referential ambiguity associated with 
the one or more users and the one or more devices in the environment based on at least a portion of 
the received multi-modal data; 

wherein further comprising the step of abstracting the received multi -modal input data into 
one or more events prior to making the one or more determinations. 

17. (Currently Amended) A computer-based conversational computing method, the method 
comprising the steps of: 

obtaining multi-modal data from an environment including one or more users and one or 
more controllable devices, the multi-modal data including data associated with a first modality input 
sensor and data associated with at least a second modality input sensor; 

making a determination of at least one of an intent, a focus and a mood of at least one of the 
one or more users based on at least a portion of the obtained multi-modal input data; 

causing execution of one or more actions to occur in the environment based on at least one 
of the determined intent, the determined focus and the determined mood; 

storing at least a portion of results associated with the intent, focus and mood determinations 
for possible use in a subsequent determination; and 

performing one or more recognition operations on the received multi-modal input data prior 
to making the one or more determinations 
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wherein the intent determination comprises resolving referential ambiguity associated with 
the one or more users and the one or more devices in the environment based on at least a portion of 
the received multi-modal data . 

18. (Currently Amended) An article of manufacture for performing conversational 
computing, comprising a machine readable medium containing one or more programs which when 
executed implement the steps of: 

obtaining multi-modal data from an environment including one or more users and one or 
more controllable devices, the multi-modal data including data associated with a first modality input 
sensor and data associated with at least a second modality input sensor; 

providing for a capability to make a determination of an intent, a focus and a mood of at least 
one of the one or more users based on at least a portion of the obtained multi-modal input data; 

causing execution of one or more actions to occur in the environment based on at least one 
of the determined intent, the determined focus and the determined mood; and 

storing at least a portion of results associated with the intent, focus and mood determinations 
for possible use in a subsequent determination 

wherein the intent determination comprises resolving referential ambiguity associated with 
the one or more users and the one or more devices in the environment based on at least a portion of 
the received multi-modal data . 

19. (Currently Amended) A multi-modal conversational computing system, the system 
comprising: 

a user interface subsystem, the user interface subsystem being configured to input multi- 
modal data from an environment in which the user interface subsystem is deployed, the multi-modal 
data including at least audio-based data and image-based data, and the environment including one 
or more users and one or more devices which are controllable by the multi-modal system; 

at least one processor, the at least one processor being operatively coupled to the user 
interface subsystem and being configured to: (i) receive at least a portion of the multi-modal input 
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data from the user interface subsystem; (ii) be capable of making a determination of an intent, a 
focus and a mood of at least one of the one or more users based on at least a portion of the received 
multi-modal input data; and (iii) cause execution of one or more actions to occur in the environment 
based on at least one of the determined intent, the determined focus and the determined mood; and 

memory, operatively coupled to the at least one processor, which stores at least a portion of 
results associated with the intent, focus and mood determinations made by the processor for possible 
use in a subsequent determination; 

wherein the intent determination comprises resolving referential ambiguity associated with 
the one or more users and the one or more devices in the environment based on at least a portion of 
the received multi-modal data . 

20. (Canceled). 

21. (Canceled). 

22. (Original) The system of claim 19, wherein the user interface subsystem comprises one 
or more image capturing devices, deployed in the environment, for capturing the image-based data. 

23. (Original) The system of claim 22, wherein the image-based data is at least one of in the 
visible wavelength spectrum and not in the visible wavelength spectrum. 

24. (Original) The system of claim 22, wherein the image-based data is at least one of video, 
infrared, and radio frequency-based image data. 

25. (Original) The system of claim 19, wherein the user interface subsystem comprises one 
or more audio capturing devices, deployed in the environment, for capturing the audio-based data. 
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26. (Original) The system of claim 25, wherein the one or more audio capturing devices 
comprise one or more microphones. 

27. (Original) The system of claim 19, wherein the user interface subsystem comprises one 
or more graphical user interface-based input devices, deployed in the environment, for capturing 
graphical user interface-based data. 

28. (Original) The system of claim 19, wherein the user interface subsystem comprises a 
stylus-based input device, deployed in the environment, for capturing handwritten-based data. 

29. (Currently Amended) A multi-modal conversational computing system, the system 
comprising: 

a user interface subsystem, the user interface subsystem being configured to input multi- 
modal data from an environment in which the user interface subsystem is deployed, the multi-modal 
data including at least audio-based data and image-based data, and the environment including one 
or more users and one or more devices which are controllable by the multi-modal system; 

at least one processor, the at least one processor being operatively coupled to the user 
interface subsystem and being configured to: (i) receive at least a portion of the multi-modal input 
data from the user interface subsystem; (ii) make a determination of at least one of an intent, a focus 
and a mood of at least one of the one or more users based on at least a portion of the received multi- 
modal input data; and (iii) cause execution of one or more actions to occur in the environment based 
on at least one of the determined intent, the determined focus and the determined mood; and 

memory, operatively coupled to the at least one processor, which stores at least a portion of 
results associated with the intent, focus and mood determinations made by the processor for possible 
use in a subsequent determination; 

wherein the intent determination comprises resolving referential ambiguity associated with 
the one or more users and the one or more devices in the environment based on at least a portion of 
the received multi-modal data: 
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wherein the execution of one or more actions in the environment comprises controlling at 
least one of the one or more devices in the environment to at least one of effectuate the determined 
intent, effect the determined focus, and effect the determined mood of the one or more users. 

30. (Currently Amended) A multi-modal conversational computing system, the system 
comprising: 

a user interface subsystem, the user interface subsystem being configured to input multi- 
modal data from an environment in which the user interface subsystem is deployed, the multi-modal 
data including at least audio-based data and image-based data, and the environment including one 
or more users and one or more devices which are controllable by the multi-modal system; 

at least one processor, the at least one processor being operatively coupled to the user 
interface subsystem and being configured to: (i) receive at least a portion of the multi-modal input 
data from the user interface subsystem; (ii) make a determination of at least one of an intent, a focus 
and a mood of at least one of the one or more users based on at least a portion of the received multi- 
modal input data; and (iii) cause execution of one or more actions to occur in the environment based 
on at least one of the determined intent, the determined focus and the determined mood; and 

memory, operatively coupled to the at least one processor, which stores at least a portion of 
results associated with the intent, focus and mood determinations made by the processor for possible 
use in a subsequent determination; 

wherein the intent determination comprises resolving referential ambiguity associated with 
the one or more users and the one or more devices in the environment based on at least a portion of 
the received multi-modal data: 

wherein the execution of one or more actions in the environment comprises controlling at 
least one of the one or more devices in the environment to request further user input to assist in 
making at least one of the determinations. 

3 1 . (Currently Amended) A multi-modal conversational computing system, the system 
comprising: 
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a user interface subsystem, the user interface subsystem being configured to input multi- 
modal data from an environment in which the user interface subsystem is deployed, the multi-modal 
data including at least audio-based data and image-based data, and the environment including one 
or more users and one or more devices which are controllable by the multi-modal system; 

at least one processor, the at least one processor being operatively coupled to the user 
interface subsystem and being configured to: (i) receive at least a portion of the multi-modal input 
data from the user interface subsystem; (ii) make a determination of at least one of an intent, a focus 
and a mood of at least one of the one or more users based on at least a portion of the received multi- 
modal input data; and (iii) cause execution of one or more actions to occur in the environment based 
on at least one of the determined intent, the determined focus and the determined mood; and 

memory, operatively coupled to the at least one processor, which stores at least a portion of 
results associated with the intent, focus and mood determinations made by the processor for possible 
use in a subsequent determination; 

wherein the intent determination comprises resolving referential ambiguity associated with 
the one or more users and the one or more devices in the environment based on at least a portion of 
the received multi-modal data; 

wherein the at least one processor is further configured to abstract the received multi-modal 
input data into one or more events prior to making the one or more determinations. 

32. (Currently Amended) A multi-modal conversational computing system, the system 
comprising: 

a user interface subsystem, the user interface subsystem being configured to input multi- 
modal data from an environment in which the user interface subsystem is deployed, the multi-modal 
data including at least audio-based data and image-based data, and the environment including one 
or more users and one or more devices which are controllable by the multi-modal system; 

at least one processor, the at least one processor being operatively coupled to the user 
interface subsystem and being configured to: (i) receive at least a portion of the multi-modal input 
data from the user interface subsystem; (ii) make a determination of at least one of an intent, a focus 
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and a mood of at least one of the one or more users based on at least a portion of the received multi- 
modal input data; and (iii) cause execution of one or more actions to occur in the environment based 
on at least one of the determined intent, the determined focus and the determined mood; and 

memory, operatively coupled to the at least one processor, which stores at least a portion of 
results associated with the intent, focus and mood determinations made by the processor for possible 
use in a subsequent determination; 

wherein the intent determination comprises resolving referential ambiguity associated with 
the one or more users and the one or more devices in the environment based on at least a portion of 
the received multi-modal data: 

wherein the at least one processor is further configured to perform one or more recognition 
operations on the received multi-modal input data prior to making the one or more determinations. 

33. (Original) The system of claim 32, wherein one of the one or more recognition 
operations comprises speech recognition. 

34. (Original) The system of claim 32, wherein one of the one or more recognition 
operations comprises speaker recognition. 

35. (Original) The system of claim 32, wherein one of the one or more recognition 
operations comprises gesture recognition. 

36. (Currently Amended) A multi-modal conversational computing system, the system 
comprising: 

a user interface subsystem, the user interface subsystem being configured to input multi- 
modal data from an environment in which the user interface subsystem is deployed, the multi-modal 
data including at least audio-based data and image-based data, and the environment including one 
or more users and one or more devices which are controllable by the multi-modal system; 
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at least one processor, the at least one processor being operatively coupled to the user 
interface subsystem and being configured to: (i) receive at least a portion of the multi-modal input 
data from the user interface subsystem; (ii) make a determination of at least one of an intent, a focus 
and a mood of at least one of the one or more users based on at least a portion of the received multi- 
modal input data; and (iii) cause execution of one or more actions to occur in the environment based 
on at least one of the determined intent, the determined focus and the determined mood; and 

memory, operatively coupled to the at least one processor, which stores at least a portion of 
results associated with the intent, focus and mood determinations made by the processor for possible 
use in a subsequent determination; 

wherein the intent determination comprises resolving referential ambiguity associated with 
the one or more users and the one or more devices in the environment based on at least a portion of 
the received multi-modal data: 

wherein the execution of the one or more actions comprises initiating a process to at least one 
of further complete, correct, and disambiguate what the system understands from previous input. 

37. (Currently Amended) A multi-modal conversational computing system, the system 
comprising: 

a user interface subsystem, the user interface subsystem being configured to input multi- 
modal data from an environment in which the user interface subsystem is deployed, the multi -modal 
data including at least audio-based data and image-based data, and the environment including one 
or more users and one or more devices which are controllable by the multi-modal system; 

an input/output manager module operatively coupled to the user interface subsystem and 
configured to abstract the multi-modal input data into one or more events; 

one or more recognition engines operatively coupled to the input/output manager module and 
configured to perform, when necessary, one or more recognition operations on the abstracted multi- 
modal input data; 

a dialog manager module operatively coupled to the one or more recognition engines and the 
input/output manager module and configured to: (i) receive at least a portion of the abstracted multi- 
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modal input data and, when necessary, the recognized multi-modal input data; (ii) make a 
determination of an intent of at least one of the one or more users based on at least a portion of the 
received multi-modal input data; and (iii) cause execution of one or more actions to occur in the 
environment based on the determined intent; 

a focus and mood classification module operatively coupled to the one or more recognition 
engines and the input/output manager module and configured to: (i) receive at least a portion of the 
abstracted multi-modal input data and, when necessary, the recognized multi-modal input data; (ii) 
make a determination of at least one of a focus and a mood of at least one of the one or more users 
based on at least a portion of the received multi -modal input data; and (iii) cause execution of one 
or more actions to occur in the environment based on at least one of the determined focus and mood; 
and 

a context stack memory operatively coupled to the dialog manager module, the one or more 
recognition engines and the focus and mood classification module, which stores at least a portion 
of results associated with the intent, focus and mood determinations made by the dialog manager and 
' -the classification module for possible use in a subsequent determination; 

wherein the intent determination comprises resolving referential ambiguity associated with 
the one or more users and the one or more devices in the environment based on at least a portion of 
the received multi-modal data . 

38. (Currently Amended) A computer-based conversational computing method, the method 
comprising the steps of: 

obtaining multi-modal data from an environment including one or more users and one or 
more controllable devices, the multi -modal data including at least audio-based data and image-based 
data; 

providing for a capability to make a determination of an intent, a focus and a mood of at least 
one of the one or more users based on at least a portion of the obtained multi-modal input data; 

causing execution of one or more actions to occur in the environment based on at least one 
of the determined intent, the determined focus and the determined mood; and 
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storing at least a portion of results associated with the intent, focus and mood determinations 
for possible use in a subsequent determination 

wherein the intent determination comprises resolving referential ambiguity associated with 
the one or more users and the one or more devices in the environment based on at least a portion of 
the received multi-modal data. 
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